Paraphrase Acquisition for Information Extraction
نویسندگان
چکیده
We are trying to find paraphrases from Japanese news articles which can be used for Information Extraction. We focused on the fact that a single event can be reported in more than one article in different ways. However, certain kinds of noun phrases such as names, dates and numbers behave as “anchors” which are unlikely to change across articles. Our key idea is to identify these anchors among comparable articles and extract portions of expressions which share the anchors. This way we can extract expressions which convey the same information. Obtained paraphrases are generalized as templates and stored for future use. In this paper, first we describe our basic idea of paraphrase acquisition. Our method is divided into roughly four steps, each of which is explained in turn. Then we illustrate several issues which we encounter in real texts. To solve these problems, we introduce two techniques: coreference resolution and structural restriction of possible portions of expressions. Finally we discuss the experimental results and conclusions.
منابع مشابه
Using Discourse Information for Paraphrase Extraction
Previous work on paraphrase extraction using parallel or comparable corpora has generally not considered the documents’ discourse structure as a useful information source. We propose a novel method for collecting paraphrases relying on the sequential event order in the discourse, using multiple sequence alignment with a semantic similarity measure. We show that adding discourse information boos...
متن کاملInvestigating a Generic Paraphrase-Based Approach for Relation Extraction
Unsupervised paraphrase acquisition has been an active research field in recent years, but its effective coverage and performance have rarely been evaluated. We propose a generic paraphrase-based approach for Relation Extraction (RE), aiming at a dual goal: obtaining an applicative evaluation scheme for paraphrase acquisition and obtaining a generic and largely unsupervised configuration for RE...
متن کاملLarge Scale Acquisition of Paraphrases for Learning Surface Patterns
Paraphrases have proved to be useful in many applications, including Machine Translation, Question Answering, Summarization, and Information Retrieval. Paraphrase acquisition methods that use a single monolingual corpus often produce only syntactic paraphrases. We present a method for obtaining surface paraphrases, using a 150GB (25 billion words) monolingual corpus. Our method achieves an accu...
متن کاملAutomatic Acquisition of Context-Specific Lexical Paraphrases
Lexical paraphrasing aims at acquiring word-level paraphrases. It is critical for many Natural Language Processing (NLP) applications, such as Question Answering (QA), Information Extraction (IE), and Machine Translation (MT). Since the meaning and usage of a word can vary in distinct contexts, different paraphrases should be acquired according to the contexts. However, most of the existing res...
متن کاملMIPA: Mutual Information Based Paraphrase Acquisition via Bilingual Pivoting
We present a pointwise mutual information (PMI) based approach for formalizing paraphrasability and propose a variant of PMI, called mutual information based paraphrase acquisition (MIPA), for paraphrase acquisition. Our paraphrase acquisition method first acquires lexical paraphrase pairs by bilingual pivoting and then reranks them by PMI and distributional similarity. The complementary nature...
متن کامل